Published as a conference paper at ICLR 2018 O N THE CONVERGENCE OF A DAM AND B EYOND
نویسندگان
چکیده
Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSPROP, ADAM, ADADELTA, NADAM are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g. learning with large output spaces, it has been empirically observed that these algorithms fail to converge to an optimal solution (or a critical point in nonconvex settings). We show that one cause for such failures is the exponential moving average used in the algorithms. We provide an explicit example of a simple convex optimization setting where ADAM does not converge to the optimal solution, and describe the precise problems with the previous analysis of ADAM algorithm. Our analysis suggests that the convergence issues can be fixed by endowing such algorithms with “long-term memory” of past gradients, and propose new variants of the ADAM algorithm which not only fix the convergence issues but often also lead to improved empirical performance.
منابع مشابه
Source Fingerprinting of Sediment Deposited in the Dam Reservoir: A Case of Lavar Dam Watershed, Fin, Hormozgan Province
Extended abstract 1. Introduction Soil erosion is a major environmental threat worldwide. This three-stage process including detachment, transportation and sedimentation of soil particle by runoff affects natural and agricultural areas of Iran. Soil erosion has many off-site and on-site effects such as sediment deposition in the lake of dam and channels, transportation of nutrients and contam...
متن کاملBreak fill-dam simulation using GeoStudio (case study: Safarood earth dam, Kerman)
One of the main reason of earth-dam breaks is inner erosion which is caused by washing fine soil particles and finally lead to seepage inside the dam. Seepage will decrease effective tension in the earth-dam's core and arching phenomena is occurred. More settlement of core than to shell is the consequence of arching and occurrence of cracking. In this paper, GeoStudio software based on finite e...
متن کاملTemporal and spatial distribution pattern of Bullacta exarata in a tidal flat at south shore of Hangzhou Bay, China
The distribution pattern of Bullacta exarata was studied in different seasons of 2004 at south shore of Hangzhou Bay, China. We found that the distribution pattern of B. exarata was aggregated in each season by Taylor's power regression and Iwao's plot regresses methods (P < 0.001). Based on two-way ANOVA analysis, the results indicated that the densities were significantly affected by the fact...
متن کاملTemporal and spatial distribution pattern of Bullacta exarata in a tidal flat at south shore of Hangzhou Bay, China
The distribution pattern of Bullacta exarata was studied in different seasons of 2004 at south shore of Hangzhou Bay, China. We found that the distribution pattern of B. exarata was aggregated in each season by Taylor's power regression and Iwao's plot regresses methods (P < 0.001). Based on two-way ANOVA analysis, the results indicated that the densities were significantly affected by the fact...
متن کاملSource apportionment of the sediments entering dam using lithological and mineralogical studies
The present study was carried out to determine the possible origins of sediments entering Taleghan Dam in northern part of Iran, in order to avoid further sedimentation and helping in extension of the useful life of the proposed dam. This was performed by XRD analysis. To do so, first of all, sediment sampling points were positioned along the Taleghan River. The collected samples, after coding,...
متن کامل